Stable Architectures for Deep Neural Networks
نویسندگان
چکیده
Deep neural networks have become valuable tools for supervised machine learning, e.g., in the classification of text or images. While offering superior results over traditional techniques to find and express complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. An important issue that must be overcome is numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper, we propose new forward propagation techniques inspired by systems of ordinary differential equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is interpreting deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and wellposedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.
منابع مشابه
Integration of remote sensing and meteorological data to predict flooding time using deep learning algorithm
Accurate flood forecasting is a vital need to reduce its risks. Due to the complicated structure of flood and river flow, it is somehow difficult to solve this problem. Artificial neural networks, such as frequent neural networks, offer good performance in time series data. In recent years, the use of Long Short Term Memory networks hase attracted much attention due to the faults of frequent ne...
متن کاملAnalysis of Deep Convolutional Neural Network Architectures
In computer vision many tasks are solved using machine learning. In the past few years, state of the art results in computer vision have been achieved using deep learning. Deeper machine learning architectures are better capable in handling complex recognition tasks, compared to previous more shallow models. Many architectures for computer vision make use of convolutional neural networks which ...
متن کاملReversible Architectures for Arbitrarily Deep Residual Neural Networks
Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success...
متن کاملOverview of Deep Neural Networks
In recent years, new neural network models with deep architectures started to get more attention in the field of machine learning. These models contain larger number of layers (therefore ”deep”) than conventional multi-layered perceptron, which usually uses only two or three functional layers of neurons. To overcome the difficulties of training such complex networks, new learning algorithms hav...
متن کاملExploring the Imposition of Synaptic Precision Restrictions For Evolutionary Synthesis of Deep Neural Networks
A key contributing factor to incredible success of deep neural networks has been the significant rise on massively parallel computing devices allowing researchers to greatly increase the size and depth of deep neural networks, leading to significant improvements in modeling accuracy. Although deeper, larger, or complex deep neural networks have shown considerable promise, the computational comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.03341 شماره
صفحات -
تاریخ انتشار 2017